Ratio Edit Tolerance Development Using Variations of Exploratory Data Analysis (eda) Resistant Fences Methods

نویسنده

  • Katherine Jenny Thompson
چکیده

1. Introduction Many data items collected by the Bureau of the Census Economic Programs are subjected to ratio edits. In a ratio edit, the ratio of two correlated items is compared to upper and lower bounds, known as tolerances. Reported items that fall outside of the tolerances are considered edit failures, and one or both of the items in an edit-failing ratio are either imputed or flagged for analyst review. The efficiency of the ratio edit is consequently dependent on the selected tolerances. In 1996, Thompson and Sigman conducted research to determine a statistical method of automatically setting tolerance limits that works well for different sets of economic data for use in the 1997 economic census. We evaluated these methods on two sets of historical data: the 1994 Annual Survey of Manufactures (ASM) and the 1992 Business Census. In both data sets, we achieved success with some variations of an Exploratory Data Analysis (EDA) method called resistant fences. The resistant fences rules flag a ratio as an outlier when it is k interquartile ranges outside of the first or third quartiles (k is a constant). In the Business Census applications, the resistant fences methods worked best when the original distributions of ratios were symmetrized using a power transformation before applying the resistant fences methods (final tolerances were obtained using the inverse-transform on the initial limits). However, in other data sets, the symmetrizing effort has not proved worthwhile. Lanska and Kryscio (1997) propose a variation of the resistant fences rules for asymmetric distributions: use the distance between the first quartile and the median and the distance between the third quartile and the median for the upper and lower fences instead of the interquartile range, thus elongating the fences in the direction of the skewness of the distribution. This paper investigates this method for tolerance development, in comparison with the previously described resistant methods on several sets of simulated data. In Section 2, I describe the resistant methods investigated for tolerance development. Section 3 describes the simulated data. Section 4 presents an evaluation of these methods. Section 5 presents my recommendation. 2. Resistant Methods Used for Tolerance Development Given an ordered distribution of ratios, let q 25 = the first quartile, q 75 = the third quartile, m= the sample median, and H = (q 75-q 25), the interquartile range. Then, Resistant Fences flag outliers as ratios less than q 25-k×H or …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of C-A fractal model and exploratory data analysis (EDA) to delineate geochemical anomalies in the: Takab 1:25,000 geochemical sheet, NW Iran

Abstract Most conventional statistical methods aiming at defining geochemical concentration thresholds for separating anomalies from background have limited effectiveness in areas with complex geological settings and variable lithology. In this paper, median+2MAD as a method of exploratory data analysis (EDA) and concentration-area (C-A) fractal model as two effective approaches in separation g...

متن کامل

Principles and Procedures of Exploratory Data Analysis

Exploratory data analysis (EDA) is a well-established statistical tradition that provides conceptual and computational tools for discovering patterns to foster hypothesis development and refinement. These tools and attitudes complement the use of significance and hypothesis tests used in confirmatory data analysis (CDA). Although EDA complements rather than replaces CDA, use of CDA without EDA ...

متن کامل

Web-based Exploratory Data Analysis (web-eda): Visualisation Meets Spatial Analysis

This paper provides an overview of the development in visualisation and spatial analysis. It has been found that, from paper maps to web-based visualisation, the roles of maps have changed from “display and storage” to “display and exploration” and “storage and linking”. The variables for presentations have been expanded from traditional set of visual variables to fives set of variables, i.e. v...

متن کامل

The Logic of Exploratory and Confirmatory Data Analysis

Exploratory Data Analysis (EDA) and Confirmatory Data Analysis (CDA) are two statistical methods widely used in scientific research. They are typically applied in sequence: first, EDA helps form a model or a hypothesis to be tested, and then CDA provides the tools to confirm if that model or hypothesis holds true. When both analyses are applied within a single experiment, two main types of erro...

متن کامل

Exploratory Data Analysis: Getting to Know Your Data

In broad terms, Exploratory Data Analysis (EDA) can be defined as the numerical and graphical examination of data characteristics and relationships before formal, rigorous statistical analyses are applied. Although the temptation to omit EDA in favor of delving into ANOVAs, MANOVAs and the like is great, the role of EDA cannot be underemphasized. EDA rewards the user with a better understanding...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999